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The preservation of the environment has become a priority and a subject that 
is receiving more and more attention. This is particularly important in the 
field of precision agriculture, where pesticide and herbicide use has become 
more controlled. In this study, we propose to evaluate the ability of the deep 
learning (DL) and convolutional neural network (CNNs) technology to 
detect weeds in several types of crops using a perspective and proximity 
images to enable localized and ultra-localized herbicide spraying in the 
region of Beni Mellal in Morocco. We studied the detection of weeds 
through six recent CNN known for their speed and precision, namely, 
VGGNet (16 and 19), GoogLeNet (Inception V3 and V4) and MobileNet 
(V1 and V2). The first experiment was performed with the CNNs 
architectures from scratch and the second experiment with their pre-trained 
versions. The results showed that Inception V4 achieved the highest 
precision with a rate of 99.41% and 99.51% on the mixed image sets and for 
its version from scratch and its pre-trained version respectively, and that 
MobileNet V2 was the fastest and lightest with its size of 14 MB. 
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1. INTRODUCTION 


Nowadays, artificial intelligence (AI) has made its way into our daily lives with the emergence of 
embedded systems and internet of things (oT). Thus, we find it particularly in terms of image processing and 
recognition. From improving the quality of photos taken by smartphones to security, access control, 
environmental recognition, and analysis for autonomous vehicles to medical diagnostics, AI is everywhere. 
Embedded systems are always called upon to analyze more and more large and resource-intensive images. 

In precision agriculture and during the last decade, several technologies have been developed to 
detect weeds and achieve localized and selective spraying. In our previous work, we have combined the 
Haar-like features with the AdaBoost algorithm to achieve the real time weeds detection in the inter-row of 
different crops [1]. Also, we have developed a new adjacency descriptor for the selection of weeds (monocot 
or dicot) to achieve the real time selective spraying [2]. 
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In terms of machine learning, several other supervised and unsupervised learning algorithms have 
been developed for vegetation segmentation. In the supervised learning methods, we cite the use of random 
forest [3], convolutional neural network (CNN) [4], decision trees [5], support vector machines [6], back 
propagation neural network [7], and Bayesian classifier [8]. Unsupervised methods focus essentially on K- 
means clustering [9] and K-means clustering based on particle swarm optimization (PSO) [10]. Recently, 
Wang et al. [11] and Liakos et al. [12] present a detailed review of several artificial vision techniques applied 
to weed detection. 

In agriculture, several problems have been solved by these developments. Recently, the authors have 
proposed and tested the use of DL in CNNs for tomato plant disease classification [13]. Zaki et al. [14] 
confirmed that MobileNet V2 correctly classified various tomato plant diseases from leaf images. This 
finding was confirmed in [15] where the authors concluded that the deep learning (DL) model outperforms 
the Support vector machine (SVM) by a fairly substantial margin in terms of classification accuracy. 

Neural networks and particularly the CNN are in great demand for image classification, object 
recognition, face recognition and fine-grained classification. In addition, these networks perform 
convolutions: Very expensive operations in terms of computation and memory. Image classification, or even 
object recognition, in embedded systems therefore represents a major challenge due to material constraints. 

In this study, we propose to evaluate the ability of the DL and CNN technology to detect weeds in 
several types of crops to enable localized and ultra-localized herbicide spraying. We present two datasets of 
RGB images from different cultivated plots. These datasets present regions of interest of 2x3 and 0.5x0.5 m? 
collected in the region of Béni Mellal in Morocco. Then, we studied the detection of weeds through six recent 
CNN known for their speed and precision, namely, VGGNet (16 and 19), GoogLeNet (Inception V3 and V4) 
and MobileNet (V1 and V2). 


2. RESEARCH METHOD 

In this section, we describe the details of the implemented CNNs for weed detection in RGB images 
of different crops. This procedure is divided into four steps: acquisition, learning, classification, and data 
evaluation. These steps are detailed below. 


2.1. Data acquisition 

In this research, two sets of images of several cereal and vegetable crops were used to develop a 
reliable and low-cost system able to detect and spray weeds in real time. The first set of images comes from 
an inclined camera mounted at the front of the spray tractor and allows to capture a tilted region of interest of 
about 2x3 m? used to enable localized herbicide spraying. The second set of images comes from a vertical 
camera allowing capturing a proximity region of interest of about 0.5x0.5 m? used to enable ultra-localized 
herbicide spraying. The concept of our real-time spray system is shown in Figure 1. 


Tilted Camera 


Figure 1. Real-time spraying system concept 


Tilted images: These images are derived from video sequences from several cultures for real-time 
processing [1]. For this current study, each image is first cropped and masked to keep only the region of 
interest (ROI) of about 3x2 m? in front of the moving tractor. Then, it is put in an output format as shown in 
Figure 2. Vertical images: These proximity images are taken vertically to better detect, recognize and classify 
weeds in the rows and line spacing of different crops [2]. In the present study, these images show ROI of 
about 0.5x0.5 m° of several crops. Then, each image is cropped to an output format as shown in Figure 3. 
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Figure 3. Proximity images: sound bean crop (A) and infested beet crop (B) 


These images were taken at different times of the day and on different cultivated plots. We ensured 
that a wide range of content was covered in the non-weed class images. Also, we included in the weed class 
several images with visual similarities to those of the non-weed class thus making the task of classification 
more difficult. 

In total, each set of images consisted of 1000 images composed of 50% images of healthy crops 
(negative class "0") and 50% images of crops infested with weeds (positive class "1"). Then we proceeded to 
increase the number of images by modifying and adjusting the image sharpness, brightness and contrast using 
the image processing software (IrfanView version 4.54; http://www.Irfanview.com). This process is 
commonly used for more effective training with small datasets. Consequently, the size of each dataset 
increased from 1000 images to 10,000 patches. In this way, we constructed three datasets (G1, G2 and G3) 
composed of tilted, proximity and mixed images, respectively. Each dataset consisted of 8000 patches for 
training, 1000 patches for validation and 1000 patches for testing. 


2.2. Data training 
Through the sets of images were previously described, the performances of the following CNNs 
were evaluated for the problem of weeds detection in the different crops. 


2.2.1. CNN architecture 

The CNN architecture is based on a deep neural network structure. It consists of several layers, 
including convolution, ReLU, pooling, flattening, and fully connected. in the end, a probabilistic distribution 
is used to generate the classes as shown in Figure 4. In order to determine the content of an image, it goes 
through two phases: i) the feature extraction phase, consisting of several layers of successive convolutions, 
which allows decoding certain characteristics of the image; ii) the classification phase predicts the class of 
the input image (a house, a car, an animal). 
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In our case, the number of output classes is reduced to two classes. The first class is the weed class 
and the second class is the weed-free class. AlexNet [16] was the first emerging CNN model, it demonstrated 
that the notion of CNN offers a good alternative to solve different image recognition problems. Following 
this model, many other CNN models have been developed, for example, VGGNet [17], GoogLeNet [18], and 
MobileNets [19]. Every new version tries to further improve the performance of the CNN in terms of size, 
accuracy, and speed. 


Input wi Output 


Oo 0 
© 9 @ Crops 
O° @ Weeds 
O20 
\ eo © 
AJ 
Convolution MA Fully SoftMax 
Kernel R elu Pooling à Connected Activation 
Flatten Layer Function 
4 FeatureMaps ——> Layer 
U Ss 
Probabilistic 
Feature Extraction Classification PET 
Distribution 


Figure 4. CNN architecture 


2.2.2. VGGNets 

VGGNet is a CNN proposed by Simonyan and Zisserman in 2014 [17]. The VGG network has five 
versions, of which VGG16 and VGG19 are the most famous. VGG16 has thirteen convolutional layers and 
three fully connected layers while VGG 19 has sixteen convolutional layers and three fully connected layers. 
In these two versions, VGG consists of two fully connected layers with 4096 channels in each layer, followed 
by another fully connected layer with 1000 channels to predict 1000 classes. The last fully connected layer 
uses the SoftMax layer for classification purposes. Due to its deepest layer, VGG19 performs better than 
VGG16, but this superiority is gained at the detriment of size. 


2.2.3. GoogLeNet 

GoogLeNet is a CNN created by Szegedy et al. [18]. The first version, called Inception V1, has 
twenty-two convolution layers augmented by an additional layer known as the Inception module. These 
builds constructions of different sizes for each convolution node (1x1, 3x3, and 5x5) and 3x3 max pooling 
node. Inception V1 won the ILSVRC14 competition ahead of VGGNets. 

Following this success, other versions were developed to increase the performances. Thus, the 
second version, called Inception V2 [20], uses the batch normalization technique to improve the learning 
performance. The third version, Inception V3 [21], reduced the module of Inception by factoring in the 
convolution node to improve the speed. The last version is Inception V4 [22]. It adopts a more uniform 
architecture and uses more inception modules. 


2.2.4. MobileNet 

MobileNet [19] is a reduced CNN destined principally for mobile devices. The basic idea is to 
replace the classical convolution with two separate processes (depthwise and pointwise separable 
convolution). Depthwise reduces the length and width dimensions and pointwise reduces the filter in the 
depth direction [23], [24]. MobileNet has a higher level of performance compared to other CNNs while 
maintaining a reduced size. MobileNet has a higher level of performance compared to other CNNs while 
maintaining a reduced size. Given these characteristics, this CNN is ideal for embedded systems. MobileNet 
V2 is a new version introduced in 2018 [25]. Two features have improved the linear bottleneck and the 
inverted residual block. MobileNet V2 has become smaller with a reduced number of parameters from 4.2 to 
3.4 M at equivalent performance with its first version. 


2.3. Evaluation de performances 
The evaluation of the weed detection ability in these different studied CNNs is performed using the 
following metrics: Sensitivity (or recall) (1) reflecting the degree of accuracy of positive examples, 
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Specificity (2) corresponding to the conditional probability of true negatives for a subclass, Accuracy (3) 
which is commonly used to evaluate the accuracy, Precision (4) informing about robustness and F1-Score (5) 
which at a high value indicates that the performance of the CNN is better in the positive class. 


TP 


Sensitivity (Recall) = ———— (1) 
Specificity = -e (2) 
TP+TN 
Accuracy = TP 4TN+FP+ FN (3) 
Precision = — (4) 
TP +FP 
F1 — Score = 2 Precision’ Sensitivity (5) 


Precision + Sensitivity 


Where, TP is number of true positives or positives cases that are positives and classified as positives. TN is 
number of true negatives or negative cases that are negative and classified as negative. FP is number of false 
positives, defined by the negative instances that are incorrectly classified as positive cases. FN is number of 
false negatives, defined by the positive instances that are incorrectly classified as negative cases. 


3. RESULTS AND DISCUSSION 

In this study, an evaluation of a selection of state-of-the-art CNNs, in both scratch and pre-trained 
versions, for weed detection using a perspective and proximity images of several crop types was performed. 
The objective of this research was to study these different CNNs by comparing their performance in terms of 
accuracy, precision, sensitivity, specificity, and the Fl-Score measured. All experiments were run on the 
same machine as shown in Table 1 using Python 3.7, TensorFlow and the Keras library. 

All CNNs are trained from scratch without using any pre-trained models, they have been trained for 
twenty epochs to get a precise idea about their real accuracy, training time, runtime, and size. Note that an 
epoch represents the set of iterations necessary for the neural network to perform a deep learning pass. 
Table 2 shows the resolution of the input images, we also find there the resulting size, training time and test 
time for each CNN trained from scratch on mixed images set. These results remain practically the same for 
the tilted images set and the proximity images set. We also note that the CNN MobileNet V2 is the lightest 
and the fastest considering its smallest size, learning and execution time as it got only 14 GB, 27 and 7 min, 
respectively. 


Table 1. Machine specifications 


Components Specifications 
Processor Intel Core 17-7700 CPU @ 3.60 GHz 
Memory 16 Gb 
Graphics GeForce GTX 1070 X 8 Gb 
Operating System Windows 10, 64 bits 


Table 2. Features of CNNs formed from scratch 


CNN Input image (Pixel) Size (MB) Training Time (Min.) Test Time (Min.) 
MobileNet V1 224 x 224 16 86 8.15 
MobileNet V2 224 x 224 14 71 7.51 

Inception V3 299 x 299 84 119 10.12 
Inception V4 299 x 299 163 151 17.25 
VGG 16 240 x 240 528 201 29.01 
VGG 19 240 x 240 548 218 31.53 


Tables 3, 4 and 5 show the performances measured for the different CNNs, formed from scratch, on 
the sets of inclined, proximities and mixed images, respectively. These different CNNs of this study obtained 
the best performances on the set of proximity images, then the set of mixed images and finally the set of 
tilted images. This is because the pictures nearby show more detail. 
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As a result, the vegetation is clear and easily differentiated in the agricultural scene, thus weeds are 
better distinguished in the rows and between rows of crops. At this point, the CNN Inception V4 obtained the 
highest scores for precision, sensitivity, specificity, accuracy, and Fl-Score for all three sets of images. Thus, 
it displayed respectively 99.31%, 99.03%, 99.31%, 99.18% and 99.18% for the tilted images, 99.55%, 
99.42%, 99.45%, 99.45% and 99.49% for the proximity images and 99.50%, 99.27%, 99.54%, 99.41% and 
99.41% for the mixed images. 

In accuracy Inception V4 is followed by Inception V3, MobileNet V2, MobileNet V1, VGG 19 and 
finally VGG 16 which obtained 98.44%, 98.37% and 98.36% respectively for the sets of tilted, proximity and 
mixed images. But in general, as can be seen from these statistics, all of these CNNs in this study recorded 
excellent weed detection results on the three sets of images namely tilted, proximity and mixed images. 


Table 3. Measured performance (%) of CNNs formed from scratch on the tilted images set 


Metric VGG 16 VGG 19 Inception V3 Inception V4 MobileNet V1 MobileNet V2 
Precision 97.74 97.81 99.22 99.31 98.79 98.99 
Sensitivity 97.38 97.51 98.99 99.03 98.49 98.53 
Specificity 98.88 98.89 99.13 99.31 99.01 99.17 
Accuracy 98.14 98.21 99.07 99.18 98.76 98.86 

F-Score 97.57 97.67 99.11 99.18 98.65 98.77 


Table 4. Measured performance (%) of CNNs formed from scratch on the proximity images set 


Metric VGG16 =VGG19 Inception V3__ Inception V4 MobileNet V1 MobileNet V2 
Precision 98.14 98.21 99.52 99.55 99.19 99.23 
Sensitivity 98.01 98.11 99.39 99.42 99.23 99.21 
Specificity 98.71 98.99 99.41 99.45 99.43 99.47 
Accuracy 98.37 98.56 99.41 99.45 99.34 99.35 

F-Score 98.08 98.17 99.46 99.49 99.22 99.23 


Table 5. Measured performance (%) of CNNs formed from scratch on the mixed images set 
Metric VGG 16 VGG 19 Inception V3 Inception V4 MobileNet V1 MobileNet V2 


Precision 98.01 97.96 99.32 99.50 98.83 99.12 
Sensitivity 97.65 97.60 99.09 99.27 98.53 98.66 
Specificity 99.15 99.10 99.23 99.54 99.05 99.30 
Accuracy 98.36 98.51 99.17 99.41 98.80 98.99 

F-Score 97.84 98.04 99.21 99.41 98.68 98.90 


Table 6 shows the results of detection of weeds on the set of mixed images obtained by these same 
CNNs in their pre-trained version. These results are similar to other results obtained with CNNs formed from 
scratch. Unsurprisingly, the CNN Inception V4 is the best performing. It scored 99.60%, 99.37%, 99.64%, 
99.51% and 99.49% in terms of Precision, Sensitivity, Specificity, Accuracy, and Fl-Score, respectively. In 
its pre-trained version, it had an accuracy of 99.51% on the set of mixed images. This shows the effectiveness 
of using the pre-trained and fine-tuned CNNs version in weeds detection. VGG 16 remains the least efficient 
among the CNNs studied, with scores of 98.03%, 97.67%, 99.17%, 98.43% and 97.86% obtained in terms of 
precision, sensitivity, specificity, accuracy, and Fl-Score, respectively. These results confirm that all of the 
CNNs in this study recorded excellent weed detection performance on all of the image sets used. 


Table 6. Measured performance (%) of Pre-trained CNN on the mixed images set 
Metric VGG 16 VGG 19 Inception V3 Inception V4 MobileNet V1 MobileNet V2 


Precision 98.03 98.14 99.32 99.60 98.95 99.20 
Sensitivity 97.67 97.78 99.09 99.37 98.65 98.74 
Specificity 99.17 99.28 99.23 99.64 99.17 99.38 
Accuracy 98.43 98.54 99.17 99.51 98.92 99.07 

F-Score 97.86 97.97 99.22 99.49 98.81 98.98 


MobileNet V2, in its version from scratch, obtained an accuracy of 98.86, 99.35 and 98.99% 
respectively for the sets of tilted, proximity and mixed images. In its pre-trained version, it obtained an 
accuracy of 99.07 on the set of mixed images. Consequently, MobileNet V2 is ranked third in this study, but 
due to its speed and its small size of 14 MB, MobileNet V2 remains our team's favorite CNN for embedded 
applications. 
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4. CONCLUSION 

In this study, we evaluated the DL combined with the CNNs technologies in the detection of weeds 
in different crops. To achieve this, we studied and compared the measured scratch and pre-trained versions of 
state-of-the-art CNNs performance on sets of tilted, proximity and mixed images of the different cultivated 
plots. According to the obtained results, all the CNNs VGG 16, VGG 19, Inception V3, Inception V4, 
MobileNet V1 and MobileNet V2 had very good performances in the detection of weeds. Inception V4 CNN 
formed from scratch obtained the highest results with an accuracy of 99.18%, 99.45% and 99.41% 
respectively on the sets of tilted, proximity and mixed images. In its pre-trained version, it had an accuracy of 
99.51% on the set of mixed images. This shows the effectiveness of using the pre-trained and fine-tuned 
CNNs version in weeds detection. 

MobileNet V2 obtained an accuracy of 98.99% and 99.07% on the mixed image sets respectively for 
its pre-trained version and its version from scratch. But due to its speed and small size of 14MB, MobileNet 
V2 remains our team's preferred CNN for embedded applications. We conclude that DL technology 
combined with CNNs is well suited for weed detection in several types of crops. In addition, this 
combination forms a reliable and inexpensive solution for carrying out localized and ultra-localized spraying 
of herbicides on weeds and sparing crops and areas devoid of vegetation. Thus, it can participate in 
increasing agricultural yields and preserving the environment. We plan to continue our research in this area 
for the identification of plants and their various diseases. 
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